Document Categorization Using Latent Semantic Indexing
نویسنده
چکیده
................................................................................................................................................. 3 Foreword by Andrew Sieja, kCura .................................................................................................. 4 How LSI Works ...................................................................................................................................... 5 Training .................................................................................................................................................... 5 Test Corpus and Performance Measures ........................................................................................... 6 Comparison with Other Techniques .................................................................................................. 7 Real-World Applications and Lessons .......................................................................................... 8 Information Filtering/Knowledge Discovery .................................................................................... 8 Document Categorization/Prioritization ........................................................................................... 8 Lessons Learned ..................................................................................................................................... 8 Epilogue by Jay Leib, kCura ............................................................................................................. 9
منابع مشابه
Big Data Categorization for Arabic Text Using Latent Semantic Indexing and Clustering
Documents categorization is an important field in the area of natural language processing. In this paper, we propose using Latent Semantic Indexing (LSI), singular value decomposing (SVD) method, and clustering techniques to group similar unlabeled document into pre-specified number of topics. The generated groups are then categorized using a suitable label. For clustering, we used Expectation–...
متن کاملSupport Vector Machines for Text Categorization Based on Latent Semantic Indexing
Text Categorization(TC) is an important component in many information organization and information management tasks. Two key issues in TC are feature coding and classifier design. In this paper Text Categorization via Support Vector Machines(SVMs) approach based on Latent Semantic Indexing(LSI) is described. Latent Semantic Indexing[1][2] is a method for selecting informative subspaces of featu...
متن کاملSupervised Locality Preserving Indexing for Text Categorization
A major characteristic of text categorization problems is the prohibitive high dimensionality of the feature space. Most discrimination methods can not work in such a condition, Latent Semantic Indexing (LSI) has been adopted to solve this problem. However, LSI is not an optimal representation for text categorization task mainly because of two reasons: first, the discriminative categorical info...
متن کاملMusic Genre Classification Using Text Categorization Method
Automatic music genre classification is one of the most challenging problems in music information retrieval and management of digital music database. In this paper, we propose a new method to classify music genres using text categorization methods. Differing from previous solutions which were mainly based on analysis on acoustic or symbolic audio signal, here we consider music as a text-like se...
متن کاملSMART Electronic Legal Discovery Via Topic Modeling
Electronic discovery is an interesting sub problem of information retrieval in which one identifies documents that are potentially relevant to issues and facts of a legal case from an electronically stored document collection (a corpus). In this paper, we consider representing documents in a topic space using the well-known topic models such as latent Dirichlet allocation and latent semantic in...
متن کامل